Data-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model

نویسندگان

Keikichi Hirose

Nobuaki Minematsu

Masaya Eto

چکیده

A data-driven method of fundamental frequency (F0) contour synthesis was developed for Japanese text-to-speech (TTS) conversion systems. In the method, synthesis is done using the F0 contour generation process model, and the model parameters for each accent phrase are estimated using statistical methods. Although it was already shown that the synthesized F0 contours sounded highly natural as those using heuristic rules arranged by experts, occasional low quality happened depending on sentences to be synthesized. In the current paper, information on sentence structure, automatically obtainable through the parsing process, is added to input parameters of the statistical methods to obtain a better estimation. The experimental results showed that the new parameter was effective for improving especially phrase component estimation. Furthermore, data-driven estimation of accent phrase boundaries for input text, a necessary step to realize TTS conversion, was also realized in a similar way. The rate of correct estimation reached 90 %.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating fundamental frequency contours for speech synthesis in yorùbá

We present methods for modelling and synthesising fundamental frequency (F0) contours suitable for application in textto-speech (TTS) synthesis of Yorùbá (an African tone language). These methods are discussed and compared with a baseline approach using the HMM-based speech synthesis system HTS. Evaluation is done by comparing ten-fold cross validation squared errors on a small corpus of four s...

متن کامل

Corpus-based synthesis of fundamental frequency contours based on a generation process model

A mode-constrained corpus-based synthesis strategy was developed for fundamental frequency (F0) contours of Japanese sentences. In the training phase, the relationship between linguistic factors and the command values (amplitudes and locations) of F0 contour generation process model was learned for a prediction module; a neural network in the current paper. Input parameters consist of linguisti...

متن کامل

A target approximation intonation model for yorùbá TTS

A complete intonation model based on quantitative target approximation is described for Yorùbá text-to-speech (TTS) synthesis. This model is evaluated analytically and perceptually and compared to a fundamental frequency (F0) model using the standard HTS implementation. Analytical results suggest that the proposed approach more efficiently models F0 contours given typical data constraints in un...

متن کامل

Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis

Generation process model of fundamental frequency contours known as Fujisaki's model is ideal to represent global features of prosody. It is a command response model, where the commands have clear relations with linguistic and para/non linguistic information included in the utterance. Therefore, by controlling fundamental frequency contours in the framework of the generation process model, a mo...

متن کامل

Smooth contour estimation in data-driven pitch modelling

Apple's next-generation text-to-speech (TTS) system in MacOS X uses a superpositional pitch model, comprising a relatively smooth underlying F0 contour and a separate contribution from the in uence of the phonetic segments. This paper focuses on the data-driven modelling of the underlying contour, based on electroglottographic signals obtained from a corpus of reiterant speech. F0 extraction fr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Data-Driven Synthesis of Fundamental Frequency Contours for TTS Systems Based on a Generation Process Model

نویسندگان

چکیده

منابع مشابه

Generating fundamental frequency contours for speech synthesis in yorùbá

Corpus-based synthesis of fundamental frequency contours based on a generation process model

A target approximation intonation model for yorùbá TTS

Using FO Contour Generation Process Model for Improved and Flexible Control of Prosodie Features in HMM-based Speech Synthesis

Smooth contour estimation in data-driven pitch modelling

عنوان ژورنال:

اشتراک گذاری